K-similar strings¶
Time: O(NxN!div(C_A!x_xC_Z!)); Space: O(NxN!div(C_A!x_xC_Z!)); hard
Strings A and B are K-similar (for some non-negative integer K) if we can swap the positions of two letters in A exactly K times so that the resulting string equals B.
Given two anagrams A and B, return the smallest K for which A and B are K-similar.
Example 1:
Input: A = “ab”, B = “ba”
Output: 1
Example 2:
Input: A = “abc”, B = “bca”
Output: 2
Example 3:
Input: A = “abac”, B = “baca”
Output: 2
Example 4:
Input: A = “aabc”, B = “abca”
Output: 2
Notes:
1 <= len(A) == len(B) <= 20
A and B contain only lowercase letters from the set {‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’}
Approach Framework¶
Explanation
We’ll call the underlying graph of the problem, the graph with 6 nodes ‘a’, ‘b’, …, ‘f’ and the edges A[i] -> B[i]. Our goal is for this graph to have only self-edges (edges of the form a -> a.) Let’s make some deductions about how swaps between A[i] and A[j] affect this graph, and the nature of optimal swap schedules.
If A = ‘ca…’ and B = ‘ab…’, then the first two edges of the underlying graph are c -> a and a -> b; and a swap between A[1] and A[0] changes these two edges to the single edge c -> b. Let’s call this type of operation ‘cutting corners’. Intuitively, our optimal swap schedule always increases the number of matches (A[i] == B[i]s) for each swap, so cutting corners is the only type of operation we need to consider. (This is essentially the happy swap assumption, proved in Couples Holding Hands)
Now consider any cycle decomposition of the underlying graph. (This decomposition (or the number of cycles), is not necessarily unique.) Through operations of cutting corners, we’ll delete all the (non-self) edges. Each cycle of length k requires k-1 operations to delete. Thus, the answer is just the minimum possible value of ∑(Ck − 1), where C1, … Ck are the lengths of the cycles in some cycle decomposition of the underlying graph. This can be re-written as (Number of non-self edges) - (Number of cycles). Hence, we want to maximize the number of cycles in a cycle decomposition of the underlying graph.
1. Brute Force with Dynamic Programming¶
Intuition and Algorithm Let P1, P2, … be possible cycles of the underlying graph GG. Let’s attempt to write G=∑kixPi for some constants ki. Then, the number of cycles is ∑ki. We can represent G and Pi as the number of directed edges from node X to Y. For example, if P1 is the cycle a -> b -> d -> e -> a, then P1 is a -> b plus b -> d plus d -> e plus e -> a. This can be represented as a column vector possibles[0] of 1s and 0s, with four 1s, (each possibles[0][i] == 1 represents the ith directed edge being there (and having quantity 1)). Similarly, the graph G can also be represented as a column vector.
This sets the stage for dynamic programming. For each graph G, represented as a column vector, say we want to find numCycles(G). We can take all possible cycles C, and check if G contains C. If it does, then a candidate answer is 1 + numCycles(G - C).
It should also be noted that maximizing the number of cycles cannot be done in a greedy way, ie. by always removing the lowest size cycle. For example, consider the graph with edges a -> b -> c -> a, b -> d -> e -> b, and c -> e -> f -> c. Those form cycles, and there is a fourth 3-cycle b -> c -> e -> b. If we remove that cycle first, then we would have only two cycles; but if we remove the first 3 cycles, then we would have three cycles.
[8]:
import itertools
class Solution1(object):
"""
Time: O(2^(N+W)), where N is the length of the string, and W is the length of the alphabet.
Space: O(2^(N+W))
"""
def kSimilarity(self, A, B):
"""
:type A: str
:type B: str
:rtype: int
if A == B:
return 0
N = len(A)
alphabet = 'abcdef'
pairs = [(a, b) for a in alphabet for b in alphabet if a != b]
index = {p: i for i, p in enumerate(pairs)}
count = [0] * len(index)
for a, b in zip(A, B):
if a != b:
count[index[a, b]] += 1
seen = set()
for size in range(2, len(alphabet) + 1):
for cand in itertools.permutations(alphabet, size):
i = cand.index(min(cand))
seen.add(cand[i:] + cand[:i])
possibles = []
for cand in seen:
row = [0] * len(alphabet) * (len(alphabet) - 1)
for a, b in zip(cand, cand[1:] + cand[:1]):
row[index[a, b]] += 1
possibles.append(row)
ZERO = tuple([0] * len(row))
memo = {ZERO: 0}
def solve(count):
if count in memo: return memo[count]
ans = float('-inf')
for row in possibles:
count2 = list(count)
for i, x in enumerate(row):
if count2[i] >= x:
count2[i] -= x
else: break
else:
ans = max(ans, 1 + solve(tuple(count2)))
memo[count] = ans
return ans
return sum(count) - solve(tuple(count))
[9]:
s = Solution1()
A = "ab"
B = "ba"
assert s.kSimilarity(A, B) == 1
A = "abc"
B = "bca"
assert s.kSimilarity(A, B) == 2
A = "abac"
B = "baca"
assert s.kSimilarity(A, B) == 2
A = "aabc"
B = "abca"
assert s.kSimilarity(A, B) == 2
2. Pruned Breadth First Search¶
Intuition Based on the underlying graph interpretation of the problem, we can prove that an optimal solution swaps the left-most unmatched character A[i] with an appropriate match A[j] == B[i] (j > i).
This reduces the number of “neighbors” of a node (string state) from O(N^2) to O(N), but it also focuses our search greatly. Each node searched with k matches, will have at most 2^k swaps on the unmatched characters. This leads to ∑k(N,k)2^k = 3^N checked states. Furthermore, some characters are the same, so we must divide the number of states by approximate factors of ∏(Ni)! (where Ni is the number of occurrences of the iith character in A.) With N <= 20, this means the number of states will be small.
Algorithm We’ll perform a regular breadth-first search. The neighbors to each node string S are all the strings reachable with 1 swap, that match the first unmatched character in S.
[10]:
import collections
import queue
class Solution2(object):
"""
Time: O(sum((k=0,n)nom(N, k)) * min(2^k,(N-k)!)/(W*((N-k)/)!})),
where N is the length of the string, and W is the length of the alphabet.
Space: (N∗t), where t is the time complexity given above.
"""
def kSimilarity(self, A, B):
"""
:type A: str
:type B: str
:rtype: int
def neighbors(S):
for i, c in enumerate(S):
if c != B[i]:
break
T = list(S)
for j in range(i+1, len(S)):
if S[j] == B[i]:
T[i], T[j] = T[j], T[i]
yield "".join(T)
T[j], T[i] = T[i], T[j]
queue = collections.deque([A])
seen = {A: 0}
while queue:
S = queue.popleft()
if S == B: return seen[S]
for T in neighbors(S):
if T not in seen:
seen[T] = seen[S] + 1
queue.append(T)
[11]:
s = Solution2()
A = "ab"
B = "ba"
assert s.kSimilarity(A, B) == 1
A = "abc"
B = "bca"
assert s.kSimilarity(A, B) == 2
A = "abac"
B = "baca"
assert s.kSimilarity(A, B) == 2
A = "aabc"
B = "abca"
assert s.kSimilarity(A, B) == 2
[12]:
import collections
class Solution3(object):
"""
Time: O(n * n!/(c_a!*...*c_z!), n is the length of A, B,
c_a...c_z is the count of each alphabet,
n = sum(c_a...c_z)
Space: O(n * n!/(c_a!*...*c_z!)
"""
def kSimilarity(self, A, B):
"""
:type A: str
:type B: str
:rtype: int
"""
def neighbors(s, B):
for i, c in enumerate(s):
if c != B[i]:
break
t = list(s)
for j in range(i+1, len(s)):
if t[j] == B[i]:
t[i], t[j] = t[j], t[i]
yield "".join(t)
t[j], t[i] = t[i], t[j]
q = collections.deque([A])
lookup = set()
result = 0
while q:
for _ in range(len(q)):
s = q.popleft()
if s == B:
return result
for t in neighbors(s, B):
if t not in lookup:
lookup.add(t)
q.append(t)
result += 1
[13]:
s = Solution3()
A = "ab"
B = "ba"
assert s.kSimilarity(A, B) == 1
A = "abc"
B = "bca"
assert s.kSimilarity(A, B) == 2
A = "abac"
B = "baca"
assert s.kSimilarity(A, B) == 2
A = "aabc"
B = "abca"
assert s.kSimilarity(A, B) == 2